BigDFT.Stats module

A module to describe information coming from ensemble averaging

symmetrize_df(df1)[source]

From a dataframe that should be asymmetrix matrix, construct the symmetrized dataframe

clean_dataframe(df, symmetrize=True)[source]

Symmetrize a dataframe and remove the NaN rows and columns

Parameters:
  • df (Dataframe) –

  • symmetrize (bool) – symmetrize the dataframe if applicable

Returns:

the cleaned dataframe

Return type:

Dataframe

stacked_dataframe(pop)[source]

Construct a stacked dataframe with all the data of the population

Warning

Weights are ignored, therefore the average value of such stacked dataframe may be different from the population mean.

weighted_dataframe(dfs, wgts)[source]

Construct a single dataframe that include the provided weigths. Useful for all the situations where one wants to have a single view of a population which is weighted

Parameters:
  • dfs (list) – list of dataframes to be weighted

  • wgts (list) – weights to be included, should be of same length of dfs

transverse_dataframe(population, target_features, target_samples)[source]

Contruct a transverse dataframe that gathers the data from some target features and samples of a population. This may be useful to show the variability of some data for particular entries

concatenate_populations(populations, extra_labels={})[source]

Write a file containing the set of the populations we want to serialized

Parameters:
  • populations (dict) – dictionary of the populations, labeled by the key

  • extra_labels (dict) – dictionary of the feature_labels and sample_labels of the populations

Returns:

the concatenated populations

Return type:

pandas.Dataframe

dump_populations(populations)[source]

Dump a dictionary of populations in a archive

Parameters:

populations (dict) – the dictionary of the populations

Returns:

the list of filenames (extension ‘.npy’)

which have been produced

Return type:

list

safe_moments(data, order, weight)[source]

Calculate a list of the moments of the distribution

safe_unary_op(data, op)[source]

Apply the operation to the data in a dataframe-compatible way

safe_multiply_and_pow(data, a, pw)[source]

Perform a multiplication and power expansion that is resilient to dataframes that contain sequences that are not floating-point number compatible

class ClusterGrammer(df)[source]

A class that facilitates the use of the clustergrammer objects :param df: the dataframe to represent the clustergrammer :type df: pandas.DataFrame

categorize(axis, cats)[source]

Define categories to be displayed next to the axis elements

Parameters:
  • axis (str) – ‘row’ or ‘col’

  • cats (dict) – label: cats dictionary where cats is a dictionary where each key contains a list of fragments

represent_only(axis, elements)[source]

Represent only the elements that are indicated on the given axis

Parameters:
  • axis (str) – ‘row’ of ‘col’

  • elements (list) – list of the elements to be represented on the given axis

show()[source]

Display the ClusterGrammer

publish(name)[source]

Produce a link of the clustergrammer object that is in the public domain

Parameters:

name (str) – name of the file to be created remotely

Returns the URL of the published ClusterGrammer

Returns:

URL of the link

Return type:

str

dataframe_dendrogram(df, method='average', metric='correlation', optimal_ordering=False, **kwargs)[source]

Dendrogram of a clustered dataframe.

cluster_labels(z, names, t)[source]

Labels for each of the clusters obtained from a dendrogram

class CData(datas)[source]

A class that facilitates the calls to data clustering algorithms.

Parameters:

datas (array-like) – the data samples to perform the clustering to